perm filename PLL.PRO[P,JRA]1 blob
sn#127202 filedate 1974-10-29 generic text, type C, neo UTF8
COMMENT ⊗ VALID 00008 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00002 00002 PROPOSAL FOR PROG. LANG. LAB.
C00003 00003 rather than compiler, would suggest research into online
C00006 00004 ideas
C00008 00005 outline: programming language lab for data structure manipulation
C00015 00006 WHY LISP-LIKE: lisp, el1 or variant
C00017 00007 INTRODUCTION
C00054 00008 WHY NOT OTHER LANGUAGE
C00055 ENDMK
C⊗;
PROPOSAL FOR PROG. LANG. LAB.
Verifying compiler
research into
on-line verification
extension of axiomatic semantics to new constructs
extension of axiomatic semantics to new programming units
standard techniques and debugging aids
symbolic interpretation
simplification
verification languages
rather than compiler, would suggest research into online
system. since compiler suggsts a degree of "finality" which
is perhaps premature. That is the debugging-verification process
should really occur simultaneously with construction, and with
compilation used only to finalize the correct result.
The only reason for compiling is to increase
speed; ther is no conceptual advantage. Thus the creative work really
precedes compiaation-- perhaps "verifying interpreter" better states the
area which needs investigation.
consider partially verified prog... make change/addition... system should
be able to tell what conditions have changed... parts which were correct
may no longer be correct or may need further qualification.
it is not clear to me that ANY existing language is well suited to
such interactive development..perhaps EL1...LISP is too poor in
data structures; and from only a passing glance, PASCAL, PL/1, ALGOL68,
type languages are too ad-hoc and batch oriented.
REDUCE has rudimentary on-line features but rather muddled semantics it
appears.
Partial evaluation should be available like REDUCE..
form-valued variables are a step in the right direction.
The general question of partially constructed programs and what can be said
about their correctness is difficult. What programming blocks(modules) are
"minimal" that is, what are the blocks which can be used with the assurance
that whatever assertions hold cannot be violated by any actions outside
the block...clearly tied up with global variables....but trying to
write itereesting defintions without globals is difficult.
?symbolic interpretation is just evaluation of form-valued variables
in the absence of any knowledge.? YES!!!!
ideas
should be difference between interactive languages and batch languages
should be difference between numerical languages and non-numerical languages
DIATRIBE ABOUT TIME TO DO PRACTICAL SYSTEM.
NEW PROG CONSTRUCTS: CONTENTION THAT PROG || DATA STRUCTURE: GENERIC, AND ON.
DIATRIBE AGAINST ASSIGNMENT:
NAMES TEMPORARY QUANTITY, NEEDED BECAUSE ALGORITHM IS GETTING TOO COMPLICATED TO
REMEMBER DETAILS.
THROW-BACK TO HARDWARE MACHINE
SHOULD BE ABLE TO DO WITHOUT IF D.S. ARE CAREFULLY DEFINED.
notice implicit difference in struct vs. seq.
diatribe about lisp: everything as tree; and algol seq.
show inadequacy of landin(p42)...TYPES go in structure definition, NAMES go into
abstract syntax which are struct.
eg "closure has a b_v part and a λ_body and an env"
should be closure ::= λ<var><exp> <env> ==> closure = struct[var:bvp;exp:bdy;env:env]
outline: programming language lab for data structure manipulation
introduction
time is ripe to develop such a lab
growing sophistication on programming construction
application to CAI: structured programming techniques
relation to abstract d.s. and VHLL
clearer understanding of verification and construction
this involves primarily interpreters rather than compilers
involves new or modified language features: cf syntax directed compilers for
Fortran vs. Algol, or LISP compiler.
should be interpreted-based rather than compiler-based
should be LISP-like to allow easy manipulation of programs, either by
the user or automatically
what is needed and what should be expected
on-line verification
on-line debugging
that non-trivial programs cannot be expected to be automatically
contructed with current formal systems
that n-t programs cannot be expected to spring bug-free from the
Creator
debugging, like counterexample production in mathematics, is a useful
means of arriving at a correct program.
bugs come in at least two flavors (1) misunderstanding of basic
algorithm; (2) errors in represenation.
type(2) can be alleviated by careful data-structure repesentation
type(1) by program constructs which are close to the intuitive understood
algorithm.
on-line construction
besides obvious simple editing aids, monitoring of contruction can at least
be done; e.g. type checking
bookeeping of simple consequences of construction actions can be done;
e.g.; modification or construction of program can have effect on
extant code and assertions; much of this can be monitored for consistency.
on-line evaluation
use of evaluators is a well know technique for giving credence to
program, indeed it is the desired end result.
on-line simplification
closely related to evaluation is symbolic interpretation, and to
generate anything but a muddle, simplification of result is required
how current approaches are lacking
inadequacy of languages
the three-bear problem: too much or not enough
special requirements for data structued definitions and programming
user defined structures
significance for proofs and verification
carefully chosen operations
significance for proofs and verification
efficient implementation
no consistent study of forms,evaluation and simplification
REDUCE
Boyer-Moore
Lombardi-Raphael
form-valued variables as extension of sub-tree replacement systems
indaequacy of environment
most languages are batch
most languages look the same: numerical and non-numerical
need techniques and ultimately commands for interpreation, partial evaluation
and general control of the construction, debugging and verification process
need proof-checker and simplifier for form-valued evaluation
proposed solution
want a language (subset) which is useable, sufficiently concise that we
can attack provability results with some assurance of success.
structured LISP: a language which allows manipulation of abstract
representation of programs is necessary.
user defined data strucutres
very large part of non-numerical LISP programming involves d.s.
(1) simulation of data structures
(2) (simulated) type checking
separation of d.s. from program is a natural requirement; declaration
of d.s. should percipitate axioms for constuctors, sel, and preds,
programming constructs reflecting data-structuring
the constructs should be tailored to type of ds.
one for structures or records
one for sequences
form valued variables
built-in as part of basic evaluator; this allows program manipulation
the evaluation is a treee transformer with each environment being a
set of local rules
a program checker and simplifier for symbolic interpretation
a sub-tree replacement system
future: a program prover
a clean and consistent prover in the spirit of Boyer-Moore
car-cdr chains should NEVER occur in LISP program.
(1) simulating d.s.
(2) accessing subcomponents
(3) type checking
(4) primitive user functions
WHY LISP-LIKE: lisp, el1 or variant
evaluator-based
program as data: everyone manipulates abstract syntax rep
WHAT IS MISSING FROM LISP
user d.s.
hoare all we need is seq and set and struct takes care of set.
simplification
BENEFITS
extensible
portable
PROBLEM AREAS NEEDING STUDY
ABSTRACT DS
ABSTRACT PROG
ABSTRACT CONTROL: DIRECTLY TIED TO A. D.S.
appears that proper characterization of a.d.s requires some specification of
control. compare "set"; as a ads. we are free to declare it as atype and call
functions, postponing a decision about its rep until later. or are we?
when writing functions about sets, we must use some control structutes of the
language; but common languages have FOR, WHILE,and recursion, all of which
operate on existing (concrete) d.s.. Thus writing algorithm using set vars but
existing control is a fraud, you have already really determined the concrete d.s.
Thus the problem requires that we be able to write algorithms using
abstract data structure AND abstract control, THEN be able to describe
a map of abstract d.s to concrete, and abstract control onto specific control.
PROGRAM IMPROVEMENT ABSTRACT ALG TO EFFICIENT AUTOMATICALLY OR OTHERWISE.
AREAS UNDERSTOOD, BUT NEED TO BE DETAILED
INTRODUCTION
The overall goals of a programming language lab should be the
development of a highly useable system to aid in the creation,
debugging, verifying and execution of programs expressed in a
realistic programming language.
The issue of useability is crucial to the success of this project. A
reasonably knowledgeable user must be comfortable with the interface
which will be used to manipulate programs of the programming
language. Providing unwieldly, clumsy, or stilted control is
unacceptable. The responses provided by the control program must also
provide information is a form understandable by the programmer.
Giving him information about his program in a form which is not
immediately related to his programming constructs is unacceptable.
(compare clausal OUTPUT for a theorem-prover which accepts "natural
INPUT".)
Similarly the programming language which we will use as a test
vehicle must be sufficiently powerful so as to allow construction of
"reasonably natural" programs. Certainly this second criterion can
be satisfied simply by providing one of the currently available
languages. However this seems unacceptable. All current programming
languages allow the construction of programs which far outstrip our
abilities analyze their formal properties. We can too easily
construct programs which are so ill-structured or opaque that
analysis of their correctness is not practical. It seems more to the
point then to develop a powerful dialect of some existing language,
representing a subset which is amenable to proof techniques. Then, as
our understanding of verification increases and our techniques for
program construction improve, we can make extensions of the original
language. However the original dialect should be chosen so that we
can speak with some assurance about the properties of programs
contained in it.
Compare the development of programming language syntax and syntax
directed translation. The syntax of Fortran was developed in a rather
ad hoc manner, before an understanding of syntax-directed
translation. Thus compilers for Fortran tend to contain many special
tests for syntax anomalies, and generally tend to be less
"structured" than compilers for Algol-like languages. Algol
translators profitted from an understanding of formal methods for
syntax translation; these translators are more "structured" and
easier to write. The expressive power of Algol did not suffer because
we imposed some requirements on the design of the syntax. Similarly,
"semantic" (better, "pragmatic") restrictions have been included in
several languages; a common case being to maintain a stack-like
discipline in the activation of program modules. It therefore does
not seem unreasonable to consider "certification" restrictions when
designing languages. If we expect to submit programs to proof
techniques it does not seem at all unnatural to require that the
language constructs match the power of our formal methods, any more
than requiring that the syntax match our ability to produce efficient
parsings or that programming constructs have efficient run-time
representations. Already such realizations are apparent on a small
scale; compare the parameter passing restrictions of Pascal.
To unduly raise user-expectation by supplying a language which far
outstrips our ability to semantically analyze its programs will lead
to immediate and valid criticism. Just as allowing syntactic
contructs which require an undue amount of processing to discover the
parse are discouraged by syntax directed methods.
This comparison of syntax and semantics in the problem of language
design is not unfair, we believe. The active participation of
semantic methods in the process of language design is much overdue.
If the current semantic techniques are not powerful enough to allow
natural description of a useful language and to formulate reasonable
proofs, then we should know it. We should no longer be satisfied to
simply apply semantics to the analysis of existing languages.
What are desireable features of a programming language lab? Ideally
we would wish to have a cradle-to-grave programming environment in
which a user can (1) describe an algorithm, (2) construct a program,
(3) experiment with and (4) debug that program and (5) modify it if
desired; (6) verify that the program realizes the described
algorithm, (7) ask for improvements to the program, and (8) compile
and finally (9) execute the correct program. One could further
desire that (10) an automatic program generator be available to take
the initial algorithmic description and present the user with a
complete program, guaranteed to be correct and efficient.
The applications of such a language system extend beyond simply a
programmer's aide. Such a programming environment should be an
invaluable aid to students learning the language. Construction of
programs in the careful manner which this lab supports, should
reinforce the concepts of structured programming. the addition to
the basic lab of a Monitor-Helper module could bring Computer Aided
Instruction in programming languages far beyond the current
"programmed textbook" approach.
Now to add more detail to the preceding ten points.
(1) First, the language presented for the description of the
algorithm and data structures must be approachable. It should include
facilities for describing data structures. The description facility
should be general and easy to use. It should also allow the user to
specify input/output conventions for his data structures. As an
accompanying feature, the language should contain strong typing.
Automatic coercion is unacceptable, but again the user should be
allowed to specify type conversion conventions. Thus execution of
any computations should require type checking. A significant number
of programming errors are the result of computations with mismatched
types.
Currently the best manifestation of user-defined data structures is
EL1, though the ideas are present in LISP, and have been rediscovered
in the interim by many people. Wirth, Hoare.
A proper characterization of an abstract data structures entails a
discussion of abstract control structures. That is, control
structures present in a programming language are concrete; they
operate on the concrete data structures of the language. If we are to
separate data structures from their representation, then we must
likewise be able to separate control from acting on specific
representations. For example, if we wish to use set-valued variables,
then we most likely will want the set-membership relation; when we
wish finally to map sets onto a data structure, we will need a means
of characterizing membership in terms of existing control structures.
Working will need to be done to give adequate axiomatic semantics for
both data structure and control structure definitions.
(2) Second, in constructing the program the user should at least be
provided with an understanding editor in the spirit INTERLISP. An
understanding bookeeper is also desireable as part of the
construction phase. At the most trivial level this observer can
recognize clashes of function names, incorrect applications of
defined functions, and in a typed language, could warn of
discrepancies in the types of actual parameters, formal parameters
and values.
Consensus says that the INTERLISP system is the best available
programming environment.
(3) To experiment with a program or partially constructed program, we
require an evaluator for expressions in the language. The evaluator
should allow partial evaluation, handling calls to as yet undefined
programs segments, or as yet unbound variables. Certainly part of the
experimentation process will involve the substantial use of a
powerful simplifier in manipulating programs by symbolic
interpretation and in simplification of verification conditions.
A conceptually simple way to approach symbolic interpretation is to
extend the range of program variables. All languages allow variables
ranging over data domains; several languages cater to procedure (or
function-valued) variables. The study of symbolic interpretation is
readily approached by allowing expression-valued variables. No
language contains a systematic study of such variables, though
REDUCE's symbolic mode is a very simple incomplete treatement, and
the work of Boyer-Moore is essentially an attempt at a theorem-prover
over such a symbolic domain. Adding expression- or form-valued
variables as an integral part of the language design is
benifcial(***why***). Simplification of the results of the partial
evaluations is obviously necessary. The pragmatics of such languages
are an extension of sub-tree replacement systems. Whether good sense
can be made of their semantics, is an open question.
Several good symbolic manipulation systems are available, and one on
them, REDUCE, has been used in a verification system.
(4) Debugging -- Non-trivial programs cannot be expected to spring
bug-free from the mind of the creator. It is difficult enough to
write a program with no syntax errors. Debugging is not
counterproductive, but should be a useful tool in obtaining a correct
program. Program debugging can involve either execution of the
program on examples or symbolic evaluation of the program, comparing
the behavior to that expected by the described algorithm, or an
atttempt to verify the correctness of the program using the formal
methods. Program bugs arise from at least two recognizable sources:
(1)misunderstanding of the basic algorithm and (2) errors in
representation. Type 2 bugs can be minimized by use of a careful
data-structuring representation which we will advocate. Type 1 bugs
can be mollified by presenting the user with control structures which
are close to the intuitively understood algorithm. Discover of bugs
will involve modification of the current program.
(5) Program modification involves program construction but also
should require a bit more. Modification of a program will commonly
have an effect on the input/output conditions which currently hold.
Indeed the reason for a modification excited by a bug is to change
the current i/o conditions to conform to those which are expected.
However, program modifications may also affect other i/o assertions.
We should be able to tell a user something about the global effect
which local modification induces on the program.
(6)verification -- As previously mentioned, simplification
techniques will play a crutial role here. We should also make a
concerted effort to develop techniques for higher level verification,
rather than expect proofs to be carried out from first principles. A
beginning is the recent work of Suzuki; see section(7). Also the
introduction of user-defined data structures has a beneficial effect
in allowing verifications on the properties of the data structure
rather than on the properties of the implementation of the data
structure.
(7)improvement -- One necessary component of this comtemplated
programning environment must involve changes of representation.
Having established the correctness of an program which uses abstract
data structures and control structures we must be able to transform
this program into an equivalent one which operates over different
data structures and/or different control structures. As the system
attempts to work the user's tansformations through the structure of
the program, new verification problems will arise.
The most well-known examples of program improvement involve changes
in control structure, replacing recursion with iteration. However,
there are equally troublesome problems in changing of data
structures. A good example of data structure manipulation is
contained in Floyd's TREESORT3. In atempting to decode the behavior
of the program SIFTUP in TREESORT3, it soon is apparent that the
program complexity derives not from the complexity of the algorithms
but form the intricacies of the encoding of a binary tree as an array
with an associated technique for finding successors in the tree. Once
the "real behavior" of SIFTUP is discovered, then giving a convincing
proof of SIFTUP's correctness is feasible. That correctness proof
might involve a proof from first principles as done by Morales, or
might involve a more intuitive verification using macros as done by
Suzuki. However, what is really desireable is to push the
verification process back even further tha Suzuki. We should be able
to describe SIFTUP as a simple (recursive) algorithm operating on an
abstract data structure; we should expect to be able to verify that
this abstract SIFTUP satisfies the necessary I/O relations; and then
we should ask the system to (1) recode SIFTUP without recursion and
(2) ask the system to map the abstract tree-representation onto an
array repesentation with a specified successor relation. The
"natural" abstract SIFTUP is quite straightforward, and a hand
simulation of the transformations involved in obtaining Floyd's
SIFTUP and Hoare's FIND gives encouragement for this approach.
Indeed, the automatic, or semi-automatic improvement of programs is a
natural domain for program manipulation systems. The programmer
wishes to state is problem in its most natural formulation; given the
relations between the programmer's representations and the data
structures and control structures of the programming lab, much of the
work involved in reconcilling the two can be done mechanically.
Indeed we can envision a user describing an abstract algorithm
operating on abstract data structures and control structures; he
should be able to verify the correctness of the abstractions to a
convincing level; and then could use the system to perform
transformations on both the abstractions nad the proofs to hopefully
obtain a lower-level program complete with corresponding correctness
proof.
(8) Techniques of program correctness and verification are applicable
to the generation of more efficient code. Clearly if the compiler can
show that our program is equivalent to one which generates more
efficient code, then that is all to the good. Indeed many of the
current optimization techniques are simply manifestations of
equivalences which were recognized by the compiler writer. The use of
symbolic evaluation and simplification can also be exploited here to
generate more efficient code.
(9)execution -- Execution of program modules is reasonably
straightforward. We must note that during execution we will see
compiled code, interpreted code, partially-specified code, and almost
assuredly we will see bugs. The control programs must cater for these
eventualities.
(10)apg -- Certainly the programming problem would be greatly
simplified if we could specify our algorithm in a suitable formalism
and expect a Program Generator to provide us with a suitable
implementation which was guaranteed to be correct. Much work has
been done in this field and some non-trivial programs can be
generated from formal specifications. Even if we allow some
user-interaction to help guide such programs, current methodology is
not developed to a point were we can expect large complicated
programs to be generated automatically. Sound work on automatic
programming should continue but it would be premature to expect such
projects to make this proposed lab obsolete in the near future.
Rather than dilute our research effort by undertaking an
implementation of a full-fledged language with all its necessary
support programs, we are convinced that a satisfactory system can be
built on a currently implemented language. Thus all of the necessary
components of the language laboratory can be written in a high level
language. Once the structure of the control program and the
programming language are settled, each of the components-- editor,
evaluator, compiler, etc.-- can be programmed as a distinct unit.
What can we say about the structure the programming language? For
descriptive power, easy of modification, portability, and
extensibility it is necessary that the language be able to operate on
an abstract data structure representation (parse tree) of programs.
This data structure representation then is valid input to the
interpreter, the compiler, the editor, the debugger, and the
simplifier. Our experience with the programming language LISP (which
is a data structure representation for M-expressions) confirms the
efficacy of this approach. The most successful enviroments for
program development have been LISP-based. The success derives from
the natural represntation of program-constructs as data items.
Programming languages which allow a natural representation of prgram
components as data items we will call LISP-based. Obviously the word
"natural" is open to interpretation. Thus, although we can map an
Algol program onto a representation as an Algol data object, (most
likely a complicated interrelated collection of arrays) it is not a
natural mapping. The recognition of the power resulting from
representing program as data in a high level language is as important
as the ability to perform that map. LISP was the first such language
to exploit this idea, and thus deserves the credit.
Thus a LISP-like language is essential. Whether that language is a
dialect of LISP, Pascal, or ALGOL68 is irrelevant. What is important
is that we have the ability to manipulate a natural, structured
representation of programs written in that language. So, disregarding
syntactic considerations, what essential features should the language
provide the user?
Certainly some means of granting user-defined data structures is a
necessity. Hoare has remarked that data structuring can be condensed
to manipulations of either sets or sequences. With a slight
modification, we believe that this is a reasonable first step. An
initial class of constructors of data structures which we call
context free is easily available. Basically, abstract representations
are context free if their concrete syntax is specifiable in BNF. We
will also grant simply semantic qualifications as attributes attached
to the BNF. Given the BNF for a representation there is a easy,
natural map onto sequences and sets. This is not a trivial set of
data structures; most typical structures fall into this class. And
since we believe that most programming projects in data structure
manipulation are actually very simple algorithms operating on
complicated data structures, the class of programs which we can write
is also non-trivial. Separating out the data structures from the
control component of a program has at least two benefits.
First, from a purely practical view, the resulting program is much
more readible. It is also much more difficult to write a program
which takes unfair advantage of the data representation.
Second, the implicit relationships between the constructors,
selectors, and recoginzers attached to the data structure, manifest
themselves as axioms of our formal system. This will have benefits in
the verification phases of the program generation. That is we can
verify program properties using our higher level knowledge of the
data structure rather than being forced to estavlish our proofs using
properties of the particular representation. As long as we are unable
to refer to underlying representation our high level reasoning will
be secure. The benefits acruing to such abstractions are well known
at the programming level. The separation of the algorithm from the
particular data representation is one of the principles of structured
programming, extending back to the earliest papers of J. McCarthy.
Abstract data structuring is simply McCarthy's abstract syntax
appearing in s slightly different guise. Abstraction, we believe,
should give a comparable gain in verification efforts. Ideally an
abstraction should form an intuitively acceptible "basis" [luckham &
von Henke] from which the system can be guided to a correct and
efficient concrete program.
We can now give a reasonably accurate description of the mechanics of
the proposed system. The input to the system, both programs and
commands will be parsed to an abstract syntax representation by the
PARSER. When output is requested by the user, either explicitly or
implicitly, the UNPARSER will perform the inverse transformation,
presenting readible output. All other components of the system -- the
command interpreter, the editor, the form evaluator, the simplifier,
the debugger, the verifier, and the compiler -- will all operate on
this abstract syntax tree.
As outlined, the lab is a complex, long-termed project. Many of the
components have been studied separately; it seems high time to
attempt to build a cohesive system, for only in that way will we
really find out how much we know and how much work is left to be
done. Which of the desiderata are realistically attainable? What is
the current state of development of the components of such a system?
First, once the basic structure of the system is designed, each of
the components can be attacked as a reasonably self-contained unit.
Also because of the LISP-like orientation of the language, debugging
and modification of the system should be more tractable. Indeed, one
of the first experiments should be a verification of the system
itself.
Since significant research has been done in most, if not all, of the
ten areas, progress to an initial system should be rapid. What is
clearly missing in current efforts is a concerted attempt to
implement a verification based language lab to see if indeed the
formal techniques can truly become a programmer's assistant.
WHY NOT OTHER LANGUAGE
compiler-based language is inadequate
problems
debugging in machine lang
what does editor operate on?
verifier must operate on other repesentation of prog
simplifier must operate on something else